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ABSTRACT 

Office  automation  systems  are  growing,  both  in  use  and  in  complexity. 
The  development  of  a  database  management  system  for  the  office  automation 
environment  becomes  a  high  priority,  in  order  to  provide  an  efficient  and 
reliable  way  to  manage  the  information  needs  of  the  office.  Therefore,  the 
specification  of  an  ’ideal’  database  server  for  the  office  automation  environ¬ 
ment  becomes  a  key  area  of  concern.  In  addition  to  providing  traditional 
database  support,  the  ideal  database  server  must  also  provide  new  database 
support,  in  order  to  meet  the  unique  and  many  needs  of  office  automation 
environments.  In  this  paper,  we  focus  on  the  characterization  and  specifica¬ 
tion  of  an  ideal  database  server  for  the  office  automation  environment.  We 
also  consider  how  such  an  ideal  database  server  can  be  effectively  integrated 
into  the  office  automation  environment.  Further,  we  examine  an  experimental 
database  system,  known  as  the  multi-backend  database  system  (M3DS) ,  as  a  can¬ 
didate  for  the  ideal  database  server  in  the  office  automation  environment. 
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1.  anRooucncN 

As  office  automation  systems  (OAS)  become  more  prevalent  in  the  work 
place,  the  need  for  database  support  in  the  office  automation  environmant 
(QAE)  becomes  a  key  issue.  In  this  paper  we  attempt  to  provide  the  character¬ 
ization  of  an  ideal  database  server  for  QAEs.  The  database  server  is  used  to 
provide  traditional  as  well  as  new  database  support  in  the  QAE.  In  addition, 
we  study  various  approaches  to  the  integration  of  the  database  server  into  an 
QAE.  In  our  characterization  and  study  of  an  ideal  server,  we  focus  on  the 
use  of  an  experimental  database  system,  known  as  the  multi-backend  database 
system  (AiEDS),  as  the  server.  Although  IC06  may  be  far  from  ideal,  it  does 
serve  as  a  benchmark  for  measuring  the  other  database  servers  for  QAEs.  In 
the  rest  of  this  paper  we  examine  how  and  why  IvGDS  may  be  considered  as  a 
database  server  for  the  QAE. 

More  specifically,  in  Section  2  we  discuss  the  architecture  and  charac¬ 
teristics  of  an  ideal  database  server  for  the  QAE.  In  Section  3  we  briefly 
describe  the  design  and  implementation  of  1036.  In  Section  4  we  analyze  how  a 
database  server  such  as  I/GD6  can  be  integrated  into  the  QAE.  The  analysis 
focuses  on  the  multiple  backend  architecture  of  MBD6  Mid  how  it  does  satisfy 
the  architectural  requirements  of  the  ideal  QAE  database  server.  In  Section 
5,  we  analyze  whether  the  unique  design  characteristics  of  M3DS  meet  the  needs 
of  the  QAE.  Finally,  in  Section  6  we  conclude  this  paper. 

2.  A  CHARACTERIZATION  OF  AN  IDEAL  DATABASE  SBNB? 

When  characterizing  an  ideal  database  server  for  the  QAE,  we  focus  our 
efforts  in  two  directions.  First,  we  consider  the  architectural  requirements 
of  the  ideal  database  server  that  will  facilitate  the  smooth  integration  of 
the  ideal  database  server  into  the  QAE.  Second,  we  consider  the  necessary 
database  system  features  or  characteristics  of  the  ideal  database  server  for 
the  QAE.  In  the  fol  lowing  two  sections,  we  examine  these  two  considerations 
for  an  ideal  database  server  in  the  QAE. 

2.1.  The  Architectural  Requirements 

The  basic  structure  of  the  QAE  consists  of  a  group  of  workstations  con¬ 
nected  using  a  local-area  network  (LAN)  (see  Figure  1,  where  a  workstation  is 
denoted  with  the  letter  Wins  square  box) .  To  successful  ly  meet  the  needs  of 
this  environment,  the  ideal  database  server  must  be  integrated  into  the  exist- 


ing  (ME.  The  integration  of  tha  ideal  databaae  aarvar  into  tba  (ME 
aaooth  and  have  no  i  1 1 -effect  on  the  existing  (MS.  If  the  ideal 
aarvar  rune  an  a  single  workstation,  it  Must  be  powerful  enough  to  a 
databaae  management  needs  of  the  currant  and  future  (ME.  Thus,  it  aae 
cal  that  the  ideal  databaae  aarvar  should  consist  of  initially  a  few 
tions  and  later  a  number  of  workstations.  With  multiple  workstati 
ideal  database  aarvar  should  reduce  and  distribute  the  databaae  me 
load  acroas  the  multiple  workstations. 
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Figure  1.  The  Basic  (MS 

Whether  the  workstations,  which  make  up  the  ideal  databaae  aarvar  act  as 
ind  ividual  databaae  system,  or  cooperate  to  handle  the  database  management 
needs  of  the  (ME,  is  also  an  issue.  It  May  not  be  feasible  in  a  given  (ME  to 
distribute  the  database  management  functional  ity  and  load  among  different 
databaae  servers  on  the  same  network,  since  the  (MS  is  not  a  distributed  data¬ 
base  system.  The  (ME  my  require  a  central  repository  of  data  and  program, 
that  is  maintained  and  accessed  via  a  single  system,  so  that  the  data  and  pro¬ 
gram  can  be  successful  ly  shared  throughout  the  (ME.  Overal  I ,  the  needs  of 
the  (ME  become  a  crucial  concern  when  specifying  the  architectural  raqui re- 
ments  of  the  ideal  database  server.  In  this  consideration,  an  ideal  database 
server  for  (MEs  should  be  configured  as  s  central  ized  database  system  running 
on  multiple  workstations. 

2.2.  The  Six  Characteristics 


There  ere  six  major  characteristics  of  an  ideal  database  server.  They 
ere  software  portability,  software  independence,  auto-oonf  igurabi  I  ity,  sur¬ 
vivability,  versatility,  and  performance.  Software  portabi I ity  provides  the 
ideal  database  server  with  the  abi  I  ity  to  be  access i b I e  on  a  wide  range  of 
hardware  system.  Specifically,  the  ideal  database  server  should  not  be  res¬ 
tricted  to  a  particular  class  of  hardware  and  a  specific  type  of  operating 
system.  Instead,  it  should  be  portable  scram  a  wids  range  of  workstations  and 
operating  system  of  the  (ME.  If  the  ideal  database  driver  is  imp  lamented  on 
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multiple  workstations,  ths  software  components  of  the  driver  running  on  the 
separate  workstations  should  be  sufficiently  independent,  so  that  the  ideal 
database  server  will  not  become  inoperative  when  a  node  (i  .e. ,  either  one  of 
the  software  components  on  a  workstation  or  a  workstation)  becomes  disabled. 
Software  independence  among  system  components  running  on  separata  workstations 
may  eliminate  software  and  hardware  interdependencies  and  the  complexity  of 
the  ideal  database  server. 

When  running  on  multiple  workstations,  the  ideal  database  server  should 
be  auto-configurable  and  reconf  igurable.  When  the  QAE  grows,  i . e . ,  the  number 
of  workstations  in  the  QAS  increases,  or  a  workstation  becomes  disabled,  the 
ideal  database  server  should  be  able  to  adjust  itself  for  the  addition  or  loss 
of  workstations.  Such  adjustment  should  require  no  new  programming  and  no 
modification  to  the  existing  software  drivers.  Further,  it  should  incur  no 
disruption  of  the  DAE  or  QAS.  The  ideal  database  server  should  also  maintain 
a  consistent  and  up-to-date  copy  of  the  database.  When  a  node  in  the  QAE  is 
disabled,  it  is  imperative  that  the  ideal  database  server  sti 1 1  be  functional, 
providing  continuous,  albeit  limited,  access  to  the  remaining  database.  This 
is  also  the  survivability  of  the  ideal  database  server. 

The  ideal  database  server  should  also  be  versati  le,  providing  the  user 
with  more  than  one  way  of  accessing  the  database.  In  an  QAE  where  there  is  a 
large  group  of  individuals  from  diverse  backgrounds  and  with  different  experi¬ 
ences  in  using  database  facilities,  the  ideal  database  server  should  provide 
different  database  language  interfaces  in  order  to  facilitate  the  database 
user  with  various  ways  of  accessing  the  database.  Finally,  the  ideal  database 
server  should  be  a  database  system  that  is  oriented  towards  providing  a  sub¬ 
stantial  level  of  performance.  As  time  goes  by,  both  the  use  of  the  ideal 
database  server  wi  1 1  increase  and  the  data  and  programs  being  stored  in  the 
database  wi  1 1  increase.  To  meet  the  growing  needs  of  the  QAE  the  ideal  data¬ 
base  server  must  be  able  to  expend  as  the  QAE  expands,  and  either  maintain  or 
increase  its  performance. 

3.  THE  tGB)  GF  A  DATABASE  CRTO?  WITH  MULTIPLE  BAO&O  COLOURATIONS 
3.1.  The  Proposed  Architecture  for  an  Ideal  Database  Driver 

We  advocate  that  the  architecture  of  an  ideal  database  driver  is  config¬ 
ured  with  one  controller  and  multiple  backends.  As  shown  in  Figure  2,  the 
control  Isr  and  the  backends  are  connected  by  a  broadcast  bus.  When  a 


transaction  is  received  from  the  host  computer,  the  controller  broadcasts  the 
transaction  to  al  I  the  backends.  Each  backend  has  a  number  of  dedicated  disk 
drives.  Since  the  database  is  distributed  across  the  backends,  a  transaction 
can  be  executed  by  all  backends  concurrently.  Each  backend  maintains  a  queue 
of  transactions  and  schedules  queries  for  execution  independent  of  the  other 
backends,  in  order  to  maximize  its  access  operations  and  to  minimize  its  idle 
time.  On  the  other  hand,  the  controller  does  very  little  work.  It  is  respon¬ 
sible  for  receiving  and  broadcasting  transactions,  routing  results,  and 
assisting  the  backends  in  the  insertion  of  new  data.  The  backends  do  all  the 
database  operations.  Just  how  this  architecture  may  have  the  six  characteris¬ 
tics  of  an  ideal  database  server  will  be  expounded  in  the  following  sections 
by  way  of  an  experimental  database  system  which  also  has  a  similar  architec¬ 
tural  configuration. 

3.2.  The  Multi-Backend  Database  System  (MBDS)  as  a  Database  Driver 

To  provide  a  central  ized  database  system,  MGDS  uses  one  or  more  identical 
minicomputers  and  their  disk  systems  as  database  backends  and  a  minicomputer 
as  the  database  control ler  to  interface  with  multiple,  dissimilar  workstations 
or  mainframes.  We  shal  I  refer  to  these  workstations  and  mainframes  as  hosts 
or  host  computers.  User  access  to  the  central  ized  database  is  therefore  accom¬ 
plished  through  a  host  computer  which  in  turn  communicates  with  the  con¬ 
troller.  Multiple  backends  are  configured  in  parallel.  The  original  design 
and  analysis  of  M3DS  are  due  to  J.  Me  non  (Hsia81a,  Hs  i  a81b] .  An  overview  of 
fc£DS  can  be  found  in  [He83] ,  with  an  analysis  of  the  message  passing  structure 
in  [Boyn83a]  .  The  implementation  and  new  design  efforts  are  documented  in 
(Boyn83b,  Demu84,  Kerr82] .  The  database  is  distributed  across  all  of  the 
backends.  The  database  management  functions  are  replicated  in  each  backend, 
i.e.,  all  backends  have  identical  software  and  hardware.  They  of  course  have 
different  portions  of  the  database. 

There  are  same  key  issues  to  explore  when  considering  fc£D6  for  CAEs.  The 
current  implementation  of  ft£D6  uses  minicomputers  for  both  the  control  ler  and 
the  backends.  The  original  intent  of  the  design  was  to  implement  a  system 
which  utilizes  microprocessor-based  computer  systems,  winchester-type  disks 
and  an  Ethernet- 1  ike  broadcast  bus.  Unfortunately,  these  were  not  available 
when  the  implementation  of  NGDS  began  in  1960.  There  are  a  number  of  reasons 
for  preferring  microprocessor-based  computer  systems  or  workstations  over  the 
traditional  minicomputers.  First,  the  32-bit  microprocesser  is  quickly 
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Figure  2.  The  MDBS  Hardware  Organization 


attaining  a  reputation  as  a  dependable,  versatile  and  fast  computer  system, 
approaching  the  speed  and  performance  of  the  minicomputers  of  five  years  ago. 


Second,  the  microprocessor-based  system  is  a  cost-effective  computer  sys¬ 
tem.  This  is  important  when  considering  that  M3DS  requires  a  minimum  of  two 
computer  systems.  It  also  implies  that  can  be  expanded  with  relative 
ease  and  minimal  cost  by  the  addition  of  backend  microprocessor-based  computer 
systems. 

The  placement  of  the  user  interface  is  also  affected  by  the  use  of 
microprocessor-based  computer  systems.  The  user  interface  provides  access  to 
M3D6  and  is  run  from  either  a  separate  host  computer  system,  or  as  part  of  the 
system  on  the  backend  controller.  When  the  user  interface  is  on  a  separate 
host  computer,  the  interface  interacts  with  the  controller  via  a  bus.  In 
either  case,  the  use  of  a  simi lar  (with  respect  to  the  control ler  and  backend 
hardware)  microprocessor-based  computer  system  for  the  user  interface 
increases  the  compatibility  and  the  maintainability  (with  respect  to  the 
harctoare  maintenance  complexities  and  costs)  of  the  database  system. 

The  final  major  issue  involves  the  ability  of  l£D6  to  support  multiple 
data  model  /language  interfaces  to  the  mu  Iti  -backend  database  system  (see 
Appendix  A).  These  multiple  mode  I  / 1  anguage  interfaces  allow  the  user  to 
access  M3D6  using  the  relational  mode  I /SQL  language,  the  hierarchical 
model/DL/1  language,  the  entity  relationship  model/Daplex  language,  or  the 
network  mode  I  /CODASYL  language.  These  interfaces  are  also  running  on  either  a 
•separate  host  computer  or  the  backend  control  ler;  and,  as  such,  the  issues 
concerning  the  user  interface  also  apply  here. 

One  final  note,  in  Appendix  A,  we  provide  a  more  detailed  discussion  of 
the  attribute-based  data  model,  the  attribute-based  data  language  (ABDL) ,  the 
M3DS  process  structure,  the  system  conf igurations  (present  and  future),  and 
the  multi-lingual  capabilities  of  M3DS. 

4.  FIVE  APPROACHES  TO  THE  INTEGRATION  OF  NB06  INTO  THE  QAE 

In  this  section,  we  examine  how  ^DS  can  be  integrated  into  the  office 
automation  environment.  Our  main  focus  is  on  ways  to  integrate  M3DS  into  the 
QAE,  and  the  relative  advantages/disadvantages  of  the  integration  configura¬ 
tions.  Recal I  that  the  basic  OAS,  consists  of  a  group  of  workstations,  con¬ 
nected  by  a  local -area  network  (LAN)  such  as  an  Ethernet  [Metc76] .  Such  a 


design  was  shown  in  Figure  1.  We  now  consider  the  integration  of  M3DS  into  the 
QAS.  We  approach  the  integration  in  five  distinct  ways. 

In  the  first  approach,  M3D6  is  added  on  as  a  separate  group  of  worksta¬ 
tions  in  the  QAS,  with  its  own  LAN.  We  characterize  this  approach  as  the 
non- integrated  dual-LAN  design.  In  this  approach,  the  additional  workstations 
are  dedicated  to  the  database  management  operations.  As  such,  they  are  inac¬ 
cessible  for  non-database  activities.  We  provide  the  interface  process, 
(which  may  include  one  or  more  language  interfaces)  as  part  of  the  user- 
accessible  workstation.  The  resulting  QAS  is  shown  in  Figure  3.  In  this  and 
the  remaining  four  approaches,  the  placement  of  the  interface  software  (i.e., 
the  number  of  workstations  and  which  workstations  have  the  interface 
software)  is  left  to  the  discretion  of  the  database  administrator. 
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Figure  3.  The  Non-Integra ted  Qua  I -LAN  Design 


The  second  approach  is  the  non- integrated  single-LAN  design.  In  this 
approach,  as  shown  in  Figure  4,  M3DS  and  the  QAS  share  a  common  LAN.  However, 
the  MX3S  control ler  and  backend  processors  still  remain  as  separate  computer 
systems  in  the  QAE. 
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Figure  4.  The  Non-Integ rated  Single-LAN  Design 

The  third  approach,  the  partial ly- integrated  design,  integrates  the  back- 
end  processes  as  permanent  background  processes  into  some  of  the  QAS  worksta¬ 
tions.  The  remainder  of  the  NEDS  backends  are  implemented  as  user- 
inaccessible  workstations.  The  mix  of  the  distribution  of  the  backend 
processes  within  the  user  workstations  is  control  led  by  the  database  adminis¬ 
trator  in  the  QAE.  The  control ler  is  the  key  component  in  NBD6,  and  should  be 
devoted  to  overseeing  the  management  of  the  database  system.  Therefore,  the 
controller  software  is  placed  in  a  separate  workstation,  that  is  not  directly 
utilized  in  the  QAS.  The  partial  ly- integrated  design  is  shown  in  Figure  5. 
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Figure  5.  The  Partial ly-Integrated  Design 


In  the  fourth  approach,  the  isolated-controller  design,  the  M9DS  backend 
software  is  integrated  into  the  existing  workstations.  As  in  the  partially- 
integrated  design,  the  controller  processes  are  implemented  in  a  user- 
inaccessible  workstation.  The  backend  processes  are  i nsta 1 1 ed  as  permanent 
background  processes  in  one  or  more  workstations.  The  isolated-controller 
design  is  shown  in  Figure  6. 
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Figure  6.  The  Isolated-Controller  Design 


In  the  fifth  approach,  the  fully-integrated  design,  the  MBD6  software  is 
completely  integrated  into  the  OAS.  The  control ler  processes  are  installed  as 
permanent  background  processes  on  one  workstation.  The  backend  processes  are 
instal  led  as  permanent  background  processes  on  one  or  more  workstations.  The 
ful ly- integrated  design  is  given  in  Figure  7. 


Figure  7.  The  Ful  ly-Integrated  Design 


In  the  non- integrated  dual-LAN  design,  we  are  using  the  GAS  LAN  as  a  log¬ 
ical  two-way  communications  device  for  M£6.  Messages  are  passed  from  the 
interface  process  of  a  particular  workstation  to  the  controller  and  from  the 
controller  back  to  the  interface  process.  In  the  remaining  four  designs,  we 
are  using  the  local  area  network  as  a  logical  five-way  communications  device. 
Messages  are  passed  from  the  interface  process  to  the  controller,  from  the 
controller  to  the  backends,  between  the  backends,  from  the  backends  to  the 
control ler,  and  from  the  control ler  back  to  the  interface  process. 

The  trade-offs  from  one  approach  to  the  next  depend  on  various  perfor¬ 
mance  and  cost  considerations.  The  non- integrated  approaches  differ  only  by 
the  cost  of  an  LAN,  but  the  corresponding  performance  gains  of  the  dual-LAN 
approach  probably  outweighs  the  cost  of  the  extra  LAN.  In  particular,  the 
burden  on  the  LAN  for  the  QAS  is  significantly  lower  in  the  dual-LAN  design. 
However,  in  both  these  approaches,  a  high  price  is  paid  as  the  database  and 
transactions  of  MBD6  grow  in  size  and  intensity.  The  integration  of  more 
backends  into  is  costly,  since  the  new  workstations  are  only  accessible 


to  the  database  management  system. 

In  such  a  situation,  either  the  partial  ly- integrated  design  or  the 
isolated-controller  design  are  feasible  alternatives.  In  both  cases,  keeping 
the  controller  on  a  non-accessible  workstation  is  a  big  performance  plus.  In 
the  partial ly- integrated  design,  as  the  database  size  grows,  more  user  works¬ 
tations  can  be  configured  into  the  database  system.  Further,  in  both  cases 
when  all  backends  are  being  used  as  backends  for  kfiDS,  additional  workstations 
can  be  added  to  either  system.  In  the  partial  ly-integrated  design,  those 
workstations  can  be  added  as  either  dedicated  database  processors  or  user 
workstations.  Again,  in  both  cases,  the  addition  of  more  backends  into  MBD6  is 
more  cost-effective,  if  the  backends  are  added  as  user  workstations.  We  feel 
that  the  fu I  ly-integrated  design  is  the  least  desirable.  The  controller  as 
part  of  a  user-accessible  workstation  would  substantially  degrade  the  perfor¬ 
mance  of  ICD6  as  the  non-database  use  of  the  workstation  at  which  the  con¬ 
troller  resides  increases. 

Overs  1 1,  the  non- integrated  dual-LAN  design  may  yield  the  highest  perfor¬ 
mance  (see  Figure  3  again).  The  performance  of  the  non- integrated  single-LAN 
f-*4  partial  ly-integrated  designs  are  about  the  same.  However,  the  partial  ly- 
jrated  design  is  more  versatile  and  cost-effective.  The  isolated- 
control  ler  design  exhibits  a  moderate  performance  capabi lity,  but  excels  as  a 
cost-effective  alternative.  Finally,  while  the  ful  ly-integrated  design  is 
cosb-eff Ktive,  its  performance  may  leave  a  lot  to  be  desired. 

5.  SIX  CWWACTBttSrriCS  OF  MBDS  FOR  AN  EFFECTIVE  ROLE  IN  TVE  CKE 

Regardless  of  the  integration  approach  chosen,  M3D6  exhibits  certain 
characteristics  that  are  desirable  in  the  OAE.  These  characteristics  include 
the  software  portabi  I  ity  of  the  code,  the  software  independence  of  the 
backend  code,  the  auto-configurability  and  reconfigurability  of  M3D6  on 
account  of  its  use  of  identical  workstations  and  replicated  software,  the  sur¬ 
vivability  of  the  system  resulting  from  the  use  of  duplicated  directory  data, 
the  versatility  of  system  due  to  the  ability  of  M3D6  to  support  multiple 
language  interfaces,  and  the  performance  capabilities  of  the  system  as  a 
result  of  its  parallel  configuration  and  round-robin  data  placement.  Each  of 
these  topics  is  examined  in  the  following  sections. 


5.1.  Software  Portab  i  I  i  fcy 

The  MSS  processes,  i .  e . ,  the  controller  processes,  the  backend 
processes,  and  the  interface  process,  are  al I  written  using  the  C  programming 
language.  C  was  chosen  as  the  programming  language  for  M3D6  because  of  its 
portability,  and  its  reputation  as  a  good  systems  programming  language.  We 
estimate  that  the  code  of  ^DS  is  about  ninety-five  percent  portable,  consist¬ 
ing  of  13,000  lines  of  C  code.  The  five  percent  of  system-dependent  code 
involves  the  inter-process  message-passing  code  on  both  the  VAX  and  the  PCP- 
11 /44s,  the  inter-computer  message  passing  code  for  the  GET  and  PUT  processes, 
and  the  disk  I/O  routines  for  the  record  processing  process.  Thus,  the  great 
majority  of  the  code  is  portable.  In  fact,  some  of  the  implementation  develop¬ 
ment  for  M3D5  takes  place  on  the  a  VAX-11/780  running  the  Unix  operating  sys¬ 
tem,  where  we  are  able  to  take  advantage  of  the  C- too  Is  provided  by  Unix. 
Thus,  we  feel  that  we  have  designed  a  relatively  portable  database  system, 
that  can  be  implemented  on  a  wide  range  of  the  32-bit  micro-computers  on  the 
market  today,  e.g.,  the  DEC  MicroVAX,  the  Sun  Workstation,  etc. 

5.2.  Software  Independence 

In  examining  the  software  independence  issue,  we  focus  on  the  backend 
processes.  The  elegance  of  M3DS  is  that  the  backend  software  of  one  backend 
is  identical  to  the  backend  software  of  another  backend.  For  logical  reasons, 
the  directory  data,  used  by  each  backend  when  processing  requests,  is 
nevertheless  duplicated  at  every  backend.  However,  the  directory  data  is  usu¬ 
ally  a  small  percentage  of  the  non-di rectory  data.  Furthermore,  the  only 
sharing  of  information  by  the  backends  occurs  in  one  phase  of  the  directory 
search.  Otherwise,  the  directory  management,  the  concurrency  control,  and  the 
record  processing  processes  are  independent  of  each  other.  So,  when  a  new 
backend  is  configured  into  the  system,  the  software  present  on  one  backend  is 
simply  rep I i cated  on  the  new  backend.  Additionally,  the  directory  data, 
duplicated  at  an  existing  backend,  is  loaded  into  the  new  backend.  When 
bringing  a  new  backend  into  M3DS,  we  must  also  decide  on  whether  to  rearrange 
the  non-directory  data.  On  the  one  hand,  we  can  redistribute  all  of  the  non¬ 
directory  data  across  the  disk  systems  of  every  backend.  This  involves 
reloading  the  data.  On  the  other  hand,  we  can  simply  leave  the  data  undis¬ 
turbed,  loading  only  new  data  on  the  new  backend.  The  choice  is  left  to  the 
discretion  of  the  database  administrator. 


5.3.  Auto-Conf  igurabi  I  ity 

On*  of  th*  most  convenient  features  of  I/BD6  is  the  abi  I  ity  to  automati¬ 
cally  configure  and  reconfigure  the  system  with  ease.  When  starting  the  sys¬ 
tem  for  the  first  time,  the  database  adninistrator  simply  specifies,  using  the 
interface,  the  number  of  backends  in  the  system.  M3DS  then  configures  itself 
by  notifying  the  control ler  and  backend  processes  the  number  of  backends  on 
the  system.  Using  this  unique  feature,  k£DS  can  be  reconfigured  when  a  back¬ 
end  becomes  inoperable.  In  such  a  situation,  M£S  is  configured  with  one  less 
backend.  Conversely,  when  a  new  backend  is  added  to  the  system,  the  system 
can  be  configured  with  one  more  backend  easi ly. 

5.4.  Survivability 

M£6  contains  only  one  copy  of  the  non-di rectory  database.  When  the 
database  is  loaded,  it  is  distributed  evenly  across  all  backends’  disk  sys¬ 
tems.  However,  the  directory  data,  which  contains  index  and  cluster  informa¬ 
tion  on  all  data  in  the  database,  is  duplicated  in  every  backend.  The  distri¬ 
buted  directory  data,  coupled  with  the  software  independence  and  reconfigura¬ 
bility  of  M£S,  offers  an  increased  survi vabi I ity  of  the  database  system  in 
the  QAE.  If  a  backend  or  backends  become  inoperable,  the  system  is  still 
usable.  While  a  backend  is  inoperable,  a  log  of  transactions  that  modified 
both  the  directory  and  the  non-directory  data  is  kept.  When  the  backend  is 
reconfigured  into  M3D6,  the  log  is  run  for  the  purpose  of  updating  the  direc¬ 
tory  and  other  data.  Although  portions  of  the  non-directory  data  become  inac¬ 
cessible  with  the  inoperable  backends,  IGQS  can  still  access  and  retrieve  the 
rest  of  the  data.  Incomplete  data  is  better  than  no  data,  provided  that  the 
user  is  informed  of  the  situation. 

5.5.  Versatility 

One  of  the  biggest  advantages  of  having  kBDS  as  part  of  an  GAS  is  the 
ability  of  kBD6  to  provide  support  for  multiple  data  models  (and  therefore 
data  languages)  through  the  use  multiple  language-based  interfaces.  In  the 
QAE,  where  users  are  from  a  varied  range  of  backgrounds,  such  a  uti I ity  is  a 
unique  feature  in  a  database  management  system.  In  fact,  the  language  inter¬ 
faces  can  be  tailored  by  the  workstation.  One  workstation  could  have  a  SQL 
interface,  another  a  DL/I  interface,  a  third  a  Dap I ex  interface,  and  perhaps 
still  another  have  a  CODASYL  interface.  By  tailoring  the  language  interfaces 
by  workstation,  the  software  required  for  each  interface  process  could  be 
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reduced.  Conversely,  with  •  wide  range  of  language  interfaces  available  at 
every  workstation,  the  workstation  becomes  more  accessible  to  a  wide  range  of 
users. 

5.6.  Performance 

The  performance  capabilities  of  any  CGMS  are  important  in  an  QAE,  since 
the  C6M5  tends  to  serve  as  a  repository  of  al  I  the  permanent  data  and  programs 
of  the  QAE.  As  the  repository  becomes  large  and  the  database  activities 
increase,  the  DBMS  as  a  database  server  may  become  the  performance  bottleneck. 
However,  is  specif  ical  ly  designed  to  provide  for  capacity  growth  and  per¬ 
formance  enhancement.  The  performance  metric  of  major  concern  is  the  response 
time  of  a  request.  The  response  time  of  a  request  is  the  time  between  the 
initial  issuance  of  the  request  and  the  receipt  of  the  final  results  for  the 
request.  IGD6  has  two  original  design  goals.  First,  if  the  database  capacity 
is  fixed  and  the  number  of  backends  is  increased,  then  the  response  time  per 
request  reduces  proportionately.  For  example,  if  a  request  had  a  response 
time  of  60  seconds  when  there  is  one  backend,  the  same  request  would  have  a 
response  time  of  nearly  30  seconds  when  there  are  two  backends,  and  of  nearly 
15  seconds  when  there  are  four  backends,  provided  that  the  database  size  has 
remained  constant. 

The  second  goal  is  stated  that  for  the  same  requests,  if  the  response 
sets  are  increased  due  to  an  increase  of  the  database  size  and  the  number  of 
backends  is  increased  in  proportion  to  the  increase  of  response  set,  then  the 
response  time  per  request  remains  the  same.  For  example,  if  a  request  had  a 
response  time  of  60  seconds  when  there  is  one  backend  with  1000  records  in  the 
response  set,  then  the  same  request  would  have  a  response  time  of  close  to  60 
seconds  when  there  are  two  backends  and  2000  records  in  the  response  set.  The 
underlying  concept  in  each  goal  is  that  MBDS  in  the  QAE  would  supply  a  data¬ 
base  system  that  would  grew  as  the  QAS  grows,  and  would  either  increase  or 
maintain  a  constant  response  time  per  request  by  ’growing'  its  backends  or 
half  a  given  response  time  per  request  by  ’doubling’  its  backends.  Oh  the 
basis  of  our  preliminary  analysis,  the  operational  NCOS  can  indeed  meet  the 
two  goals.  The  analysis  is  also  documented  in  [Teka84]  . 

6.  CONCLUSIONS 

We  have  shown  how  k£DS  can  play  an  important  role  in  the  QAE  as  an 
Specif  ical  ly,  we  have  shown  how  M3DS  can  provide  both  traditional  and  new 
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APP0OIX  A:  T>€  HAJLTI-BtO®©  DATABASE  SfSTBI 

In  this  appendix  we  examine  the  structure  of  the  multi  backend  database 
system,  focusing  on  the  data  model ,  i  .e. ,  the  attribute  baaed  data  modal  ,  the 
data  language,  i.e. ,  the  attribute  baaed  data  language  (AHDL) ,  the  pmraas 
structure,  the  system  conf  igursticns,  and  the  abi  I  ity  of  MODS  to  support  mul¬ 
tiple  date  models  and  database  languages. 


A.l  The  Attribute  BOaed  Data 


In  the  attribute  baaed  data  modal,  data  is  modeled  with  the  constructs: 
database,  file,  record,  attribute-value  pair,  directory  keyword*,  directory, 
record  body,  keyword  predicate,  and  query.  Informal  ly,  a  databaaa  consists  of 
a  ool  lection  of  files.  Each  f i  le  contains  a  group  of  reoorda  which  are 
characterized  by  a  unique  set  of  directory  keywords.  A  record  is  composed  of 
two  parte.  The  first  part  is  s  ool  I  act  ion  of  attributa-vslua  pairs  or  key¬ 
words.  An  attribute-value  pair  is  a  member  of  the  Cartesian  product  of  the 
attribute  name  and  the  value  domain  of  the  attribute.  Ae  an  example,  POPU¬ 
LATION,  2SOOO>  is  an  attribute-value  pair  having  29000  ae  the  value  for  the 
population  attribute.  A  record  contains  at  moat  one  attribute-value  pair  for 
each  attribute  defined  in  the  database.  Certain  attribute-value  pairs  of  a 
record  (or  a  file)  ere  celled  the  directory  keywords  of  the  record  (file), 
because  either  the  attribute-value  pairs  or  thei r  attribute-value  ranges  are 
kept  in  a  directory  for  addressing  the  record  (file).  Those  attribute-value 
pairs  which  are  not  kept  in  the  directory  for  addressing  the  record  (file)  are 
cal  lad  non-directory  keywords.  The  rest  of  the  record  is  textual  information, 
which  is  referred  to  as  the  record  body.  An  example  of  a  record  is  shown 
below. 


The  angle  brackets,  <,>,  enclose  an  attribute-value  pair,  i .e. ,  keyword.  The 
curly  brackets,  {,},  include  the  record  body.  The  first  attribute-value  pair 
of  all  records  of  a  file  is  the  same.  In  particular,  the  attribute  is  FILE 
and  the  value  is  the  file  name.  A  record  ia  enclosed  in  the  parenthesis.  For 
example,  the  above  ample  record  is  from  the  Census  file. 

The  database  is  accessed  by  indexing  on  directory  keywords  using  keyword 
predicates.  A  keyword  predicate  is  a  tuple  consisting  of  an  attribute,  a 
relational  operator  (*,  !*,  >,  <,  >*,  <=) ,  and  an  attribute  value,  e.g. ,  POPU¬ 
LATION  >*  20000  is  a  keyword  predicate.  More  specifically,  it  is  a  great* r- 
than-or-equa  I -to  predicate.  Gadaining  keyword  predicates  in  disjunctive  nor¬ 
mal  form  characterizes  a  query  of  the  database.  The  query 
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wi  1 1  be  satisfied  by  al  I  records  of  the  Census  f i  le  with  the  CITY  of  either 
Monterey  or  San  Jbse.  For  clarity,  we  also  employ  parentheses  for  bracketing 
predicates  in  a  query. 

A. 2  The  Attribute-Based  Data  Language  (ABDL) 

The  ABDL  supports  the  four  primary  database  operations,  INSfcKl ,  DELETE, 
UPDATE,  and  RETRIEVE.  A  request  in  the  ABDL  is  a  primary  operation  with  a 
qualification.  A  qua  I  if  i  cation  is  used  to  specify  the  information  of  the 
database  that  is  to  be  operated  on.  Two  or  more  requests  grouped  together 
characterize  a  transaction.  Now,  let  us  briefly  examine  the  four  types  of 
requests. 

The  INSERT  request  is  used  to  insert  a  new  record  into  the  database.  The 
qualification  of  an  INSERT  request  is  a  list  of  keywords  which  describe  the 
record  being  inserted.  Example  2.1  contains  an  INSERT  request  that 

Example  2.1:  DCERT  ( 


that  wi 1 1  insert  a  record  into  the  Computer  Science  Department  file  for  the 
employee  Hsiao  with  a  salary  of  $50,000. 


A  DELETE  request  is  used  to  remove  record (s)  from  the  database.  The 
qualification  of  a  DELETE  request  is  a  query.  Example  2.2  is  a  request  that 

Example  2.2:  DELETE  (  Comgutar^cjence  Department)  A 

would  delete  al  I  records  whose  salary  is  greater  than  $100,000  in  the  Computer 
Science  Department  f i le. 

An  UPDATE  request  is  used  to  modify  records  of  the  database.  The  qualif¬ 
ication  of  an  UPDATE  request  consists  of  two  parts,  the  query  and  the  modif¬ 
ier.  The  query  spec  if ies  which  records  of  the  database  are  to  be  modified. 
The  modifier  specifies  how  the  records  being  modified  are  to  be  updated. 
Example  2.3  is  an  UPDATE  request  that 

Example  2.3:  UW1E  (  Cogjjjjjr 

wi  1 1  modify  al  I  records  of  the  Computer  Science  Department  f  i  le  by  increasing 
all  salaries  by  $5,000.  In  this  example,  (  (FILE  =  Computer  Science  Depart¬ 
ment)  )  is  the  query  and  (SALARY  =  SALARY  +  $5,000)  is  the  modifier. 

Lastly,  the  RETRIBE  request  is  used  to  retrieve  records  of  the  database. 
The  qualification  of  a  retrieve  request  consists  of  a  query,  a  target-list, 
and  a  BY_c I ause .  The  query  specif  ies  which  records  are  to  be  retrieved.  The 
target-list  is  a  list  of  output  attributes.  An  aggregate  operation,  i .  e . , 
AVG,  COUNT,  SUM,  MIN,  MAX,  may  be  applied  to  one  or  more  attributes  in  the 
target-list.  The  optional  BY_clause  may  be  used  to  group  records  when  an 
aggregate  operation  is  specified.  The  RETRIEVE  request  in  Example  2.4  will 
retrieve 

Example  2.4:  RETRIEVE  (  (FILE ^=^<^ompu ter ^Sc i eyce  Depyi^tgyt)  A 

the  employee  names  of  al  I  records  in  the  Computer  Science  Department  f  i  le  with 
city  being  Monterey.  (  (FILE  =  Computer  Science  Department)  A 
(CITY  =  Monterey)  )  is  the  query  and  (NM€)  is  the  Target-List. 

Obviously,  AEGL  is  considerably  more  complete  than  the  aforementioned 
examples  have  shown.  For  our  purpose,  these  examples  wi 1 1  suffice. 
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A. 3  The  Process  Structure 

Currently,  MulBac/CBS  does  not  cctmun i cate  with  a  host  machine.  The 
absence  of  this  communication  requires  that  the  test  interface  process,  the 
process  used  to  interact  with  MulBac/DGS,  be  placed  in  the  MulBac/DBS  con¬ 
trol  ler.  In  this  section  we  describe  the  process  structure  of  MulBac/DBS. 
First  we  present  the  test  interface  process,  which  is  used  to  access  the  sys¬ 
tem.  Next,  we  review  the  processes  of  the  control  ler.  Finally,  we  describe 
the  processes  of  each  backend. 

A. 3.1  The  Test  Interface  Process 

The  test  interface  process  is  a  menu-driven  interface  to  the  MulBac/DBS. 
The  main  actions  of  the  test  interface  are,  loading  a  database,  generating  a 
database,  and  executing  the  request  interface.  When  executing  the  request 
interface,  the  user  has  the  option  to  choose  a  new  database  to  work  with, 
create  a  new  list  of  traffic  units,  modify  an  existing  list  of  traffic  units, 
select  traffic  units  from  an  existing  list  for  execution,  select  an  existing 
list  so  that  al  I  traffic  units  on  the  list  may  be  executed,  or  specify  the 
display  mode  of  the  results. 

A. 3. 2  The  Processes  of  the  Control  ler 

The  control  ler  is  composed  of  three  processes:  request  preparation, 
insert  information  generation,  and  post  processing.  Request  preparation 
receives,  parses  and  formats  a  request  (transaction)  before  sending  the  for¬ 
matted  request  (transaction)  to  the  directory  management  process  in  each  back¬ 
end.  Insert  information  generation  is  used  to  provide  additional  information 
to  the  backends  when  an  insert  request  is  received.  Since  the  data  is  distri¬ 
buted,  the  insert  only  occurs  at  one  of  the  backends.  Thus  it  must  determine 
the  backend  at  which  the  insert  will  occur,  along  with  certain  directory 
information.  Post  processing  is  used  to  collect  all  the  results  of  a  request 
(transaction)  and  forward  the  information  back  to  the  host  computer. 

A. 3. 3  The  Processes  of  Each  Backend 

Each  backend  is  also  composed  of  three  processes.  They  are  of  course  dif¬ 
ferent  from  the  controller  processes.  They  are:  directory  management,  con¬ 
currency  control,  and  record  processing.  Directory  management  performs  the 
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March  of  the  directory  structure  to  determine  the  secondary  storage  aggresses 
necessary  to  access  the  clustered  records.  Concurrency  control  determines 
when  the  request  can  be  executed.  Record  processing  performs  the  operation 
specified  bythe  request. 

A. 4*  The  Current  and  Future  Configurations 

The  current  hardware  configuration  of  MSS  consists  of :  a  VAX-11/700  run¬ 
ning  as  the  controller  and  two  PCP-11/44*  running  as  backends.  Communication 
between  computers  in  KiBDS  is  achieved  by  using  a  time-di vision-multiplexed  bus 
called  the  parallel  communication  link  (PCL-llB)  [DEC  A#] .  There  area  total 
of  three  PCLs  in  the  configuration,  two  from  the  VAX-11/780  to  the  PCP-11/44S, 
and  one  between  the  two  PCP-ll/44e.  When  the  implementation  of  KISS  began  in 
1900,  the  required  broadcast  bus  was  not  avai  table.  Even  though  we.  required; a 
broadcast  bus  for  our  design,  the  PCL  was  chosen.  The  VAX-11/780  runs  the  NMS 
operating  system,  with  the  PCP-11/44®  running  the  RSX-11M  operating  system. 

The  VAX-11/780  serves  a  dual  purpose  in  the  current  configuration,  as 
both  the  host  computer  and  the  control  I er.  In  addition  to  the  controller 
processes  described  in  Section  2.3,  we  have  also  implemented  the  interface 
process  on  the  VAX.  Given  the  large  virtual  and  pri nary  memory  capacities  of 
the  VAX,  we  felt  that  the  additional  overhead  of  running  the  interface  process 
in  the  control  ler  would*  not  be  substantial .  The  PDP-ll/44s  contain  only  the 
backend  processes.  Plans  are  being  made  to  replace  the  PCL-llBs  with  an 
Ethernet-like  broadcast  bus  and  the  VAX-11/780  and  PDP-ll/44s  with 
mi rcroprocassor-based  CPU  and  winchester-type  disk  systems,  and  increase  the 
number  of  backends  and; their  disk  systems  to  six. 

A. 5  Supporting  Multiple  Language  Based  Interfaces 

Typically,  the  design  and  implementation  of  s  conventional  database  sys¬ 
tem  begins  with  the  choice  of  a  date  model,  the  specification  of  a  mods!  based 
data  language,  and  the  design  and  implementation  of  a  database  system  which 
controls  and  executes  the  transactions  written  in  the  data  language.  Thus,  we 
have  the  relational  modal,  the  90k.  language  and  the  SQU/Deta  System.  Simi¬ 
larly,  we  have  the  hierarchical  model,  the  DL/I  language  end  the  IMS  system. 
We  may  also  have  the  case  of  the  CODASYL  model ,  language  and  system.  The  con¬ 
ventional  approach  to  the  design  and  implementation  of  a  system  is  limited  to 
a  single  data  model ,  a  specific  date  language  and  e  homogeneous  database 


system.  However,  the  attributed-based  model  and  the  attribute-based  data 
language  of  the  multi -backend  database  system  (KBD6)  are  sufficiently  powerful 
and  high-level  and  can  support  multiple  data  models  and  several  model-based 
languages  as  if  the  system  were  a  heterogeneous  collection  of  database  sys¬ 
tems. 

This  unconventional  design  and  implementation  approach  reveals  two  impor¬ 
tant  database  concepts.  First,  that  the  attribute-based  model  is  an  exceed¬ 
ingly  simple  yet  powerful  data  model,  such  that  many  other  data  models  may  be 
realized  easi ly  by  using  this  data  model .  Second,  the  data  language  of  MBD6, 
i.e.,  the  attribute-based  data  language  ABDL,  consists  of  high-level  and  pri¬ 
mary  operations,  such  that  most  of  the  other  model-based  language  constructs 
can  be  mapped  into  ABDL  in  a  straightforward  fashion.  There  could  be  an  SQL 
interface  so  that  the  transactions  written  in  SQL  can  be  carried  out  by  kBD6. 
The  execution  of  the  transactions  requires  the  SQL  constructs  to  be 
transformed  into  the  primary  operations  of  ABDL  through  the  interface.  Simi¬ 
larly,  there  could  be  a  DL/I  interface  so  that  the  transactions  written  in 
DL/I  can  also  be  carried  out  by  the  interface.  In  this  way,  the  single  data¬ 
base  system  and  multiple  interfaces  allow  the  system  to  support  multiple  data 
models  and  data  languages  as  if  it  were  a  heterogeneous  col  lection  of  database 
systems.  In  practice,  we  can  construct  a  number  of  interfaces  to  support 
relational,  hierarchical,  and  network  operations  with  a  minimal  effort.  Such 
an  approach  is  clearly  an  attractive  alternative  to  the  approach  where 
separate,  stand-alone  systems  must  be  developed  for  specific  models. 

The  procedure  to  construct  a  relational,  hierarchical ,  or  network  inter¬ 
face  to  M3DS  is  done  at  both  the  database  and  data  language  levels.  At  the 
database  level,  the  series  of  papers  [Bane78a,  BaneTBb,  BaneSO]  demonstrated 
that  a  relational,  hierarchical,  or  network  database  can  be  converted  into  an 
attribute-based  database.  At  the  data  language  level,  we  focus  on  the 
development  of  language  interfaces  to  the  attribute-based  system  consistent 
with  the  user’s  chosen  language.  At  this  level,  we  address  three  issues.  The 
first  issue  is  to  determine  how  the  operations  of  the  chosen  language  can  be 
implemented  using  the  operations  of  M3D6.  The  second  issue  is  the  translation 
of  the  language  of  the  interface  to  the  attribute-based  data  language.  The 
third  issue  is  the  placement  of  the  language  interface  within  M3DS. 

Our  current  work  on  language  interfaces  to  l£D6  is  at  the  design  level. 
The  two  interfaces  we  have  designed  are  for  SQL  (Macy84,  Rot  184]  and  for  DL/I 


[Weis84]  .  To  facilitate  the  development  of  the  SQU  interface,  we  also  have 
developed  algorithms  to  implement  the  sort  and  merge  algorithms  in  MBD6 
[Muld64] .  Using  these  designs,  we  plan  on  implementing  the  two  interfaces  in 
the  coming  months.  In  addition,  we  will  be  publishing  a  nunfcer  of  papers  on 
interface  development  and  implementation.  It  is  sufficient  to  say  that  data¬ 
base  support  in  an  office  automation  environment  should  be  multi-lingual  in 
database  management. 
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